A XML-Based Term Extraction Tool for Basque
نویسندگان
چکیده
This project combines linguistic and statistical information to develop a term extraction tool for Basque. Being Basque an agglutinative and highly inflected language, the treatment of morphosyntactic information is vital. In addition, due to late unification process of the language, texts present more elevated term dispersion than in a highly normalized language. The result is a semiautomatic terminology extraction tool based on XML, for its use in technical and scientific information managing.
منابع مشابه
Linguistic and Statistical Approaches to Basque Term Extraction
The development of applications for terminology extraction in Basque demands previous research on linguistic techniques, in order to fulfil the requirements of Basque language processing. Being Basque an agglutinative language, the results of pure statistical methods are not satisfactory and suitable for term extraction. In this work, we have adopted a hybrid approach, based on the selection of...
متن کاملELexBI, A BASIC TOOL FOR BILINGUAL TERM EXTRACTION FROM SPANISH-BASQUE PARALLEL CORPORA
We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim of this work is to develop some techniques for the automatic extraction of pairs of equivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a previous monolingual extraction of term candidates in each language, the...
متن کاملComputational Lexicography and Lexicology Elexbi, a Basic Tool for Bilingual Term Extraction from Spanish-Basque Parallel Corpora
We present the work done by Elhuyar Foundation in the field of bilingual terminology extraction. The aim ofthis work is to develop some techniques for the automatic extraction ofpairs ofequivalent terms from Spanish-Basque translation memories, and to implement those techniques in a prototype. Our approach is based on a monolingual extraction of term candidates in each language, then the creati...
متن کاملMorphosyntactic structure of terms in Basque for automatic terminology extraction
This paper describes the morphosyntactic patterns of technical terms in Basque and presents an architecture for a term-extracting tool. As Basque is a highly inflected agglutinative language, partof-speech information is not enough to define term patterns. The use of morphological and syntactic information is essential to reduce considerably the number of structures. For example, a noun, an adv...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کامل